GH-38007: [C++] Add VariableShapeTensor implementation#38008
GH-38007: [C++] Add VariableShapeTensor implementation#38008rok wants to merge 62 commits intoapache:mainfrom
Conversation
eec16ec to
6287402
Compare
dff5441 to
9c3b464
Compare
|
Hi @rok! It would be great to have this in 15.0.0 (added the milestone for visibility). Do you think you have enough bandwidth to work on it at the moment? |
|
@AlenkaF yes I'd very much like to continue working on this, I was waiting for the release before chasing reviewers :D |
Co-authored-by: Antoine Pitrou <pitrou@free.fr> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
pitrou
left a comment
There was a problem hiding this comment.
Some comments. I haven't looked at the tests too closely.
| if(ARROW_TESTING) | ||
| add_library(arrow_compute_kernels_testing OBJECT test_util_internal.cc) | ||
| add_library(arrow_compute_kernels_testing OBJECT | ||
| test_util_internal.cc ../../extension/tensor_extension_array_test.cc) |
There was a problem hiding this comment.
Why do we need this? This doesn't sound logical.
| return Status::OK(); | ||
| } | ||
| ARROW_EXPORT | ||
| Status ComputeStrides(const std::shared_ptr<DataType>& value_type, |
There was a problem hiding this comment.
Could return Result<std::vector<int64_t>> for better ergonomics
| const auto size = static_cast<int64_t>(permutation.size()); | ||
| std::vector<uint8_t> dim_seen(size, 0); | ||
| ARROW_EXPORT | ||
| Status IsPermutationValid(const std::vector<int64_t>& permutation); |
There was a problem hiding this comment.
These helper functions all could take std::span<const int64_t> as inputs, though not important either.
| std::vector<int64_t>* strides) { | ||
| auto fixed_width_type = internal::checked_pointer_cast<FixedWidthType>(value_type); | ||
| if (permutation.empty()) { | ||
| return internal::ComputeRowMajorStrides(*fixed_width_type.get(), shape, strides); |
There was a problem hiding this comment.
This one is doing the exact same thing except no permutation, please reconcile the two functions instead of duplicating code.
| return false; | ||
| } | ||
|
|
||
| auto is_permutation_trivial = [](const std::vector<int64_t>& permutation) { |
There was a problem hiding this comment.
Again, please don't duplicate code like this, factor out into common helpers.
| for (const auto& x : document["permutation"].GetArray()) { | ||
| permutation.emplace_back(x.GetInt64()); | ||
| } |
There was a problem hiding this comment.
What happens if "permutation" is not an array of ints?
| ) | ||
| ]) | ||
| def test_tensor_type_str(tensor_type, text): | ||
| def test_tensor_type_str(tensor_type, text, pickle_module): |
There was a problem hiding this comment.
This change doesn't seem useful?
| assert result.to_tensor().shape == (1, 3, 2, 2) | ||
| assert result.to_tensor().strides == (12 * bw, 1 * bw, 6 * bw, 2 * bw) | ||
|
|
||
| tensor_type = pa.fixed_shape_tensor(arrow_type, [2, 2, 3], permutation=[2, 1, 0]) |
There was a problem hiding this comment.
I suppose it's unrelated but it's testing something useful anyway?
| const auto start_position = data_array->offset() * byte_width; | ||
| const auto size = std::accumulate(shape.begin(), shape.end(), static_cast<int64_t>(1), | ||
| std::multiplies<>()); | ||
| ARROW_CHECK_EQ(size * byte_width, data_array->length() * byte_width); |
There was a problem hiding this comment.
This can be evidently simplified.
Rationale for this change
We want to add VariableShapeTensor extension type definition for arrays containing tensors with variable shapes.
What changes are included in this PR?
This adds a C++ implementation.
Are these changes tested?
Yes.
Are there any user-facing changes?
This adds a new extension type C++.